Import libraries and check versions.
In [1]:
import pandas as pd
import numpy as np
import sys
print('Python version ' + sys.version)
print('Pandas version ' + pd.__version__)
print('Numpy version ' + np.__version__)
Read the data and get a row count. Data source: U.S. Department of Transportation, TranStats database. Air Carrier Statistics Table T-100 Domestic Market (All Carriers): "This table contains domestic market data reported by both U.S. and foreign air carriers, including carrier, origin, destination, and service class for enplaned passengers, freight and mail when both origin and destination airports are located within the boundaries of the United States and its territories." -- 2015
In [2]:
file_path = r'data\T100_2015.csv.gz'
df = pd.read_csv(file_path, header=0)
df.count()
Out[2]:
In [3]:
df.head(n=10)
Out[3]:
In [4]:
df = pd.read_csv(file_path, header=0, usecols=["PASSENGERS", "ORIGIN", "DEST"])
In [5]:
df.head(n=10)
Out[5]:
In [6]:
print('Min: ', df['PASSENGERS'].min())
print('Max: ', df['PASSENGERS'].max())
print('Mean: ', df['PASSENGERS'].mean())
In [7]:
df = df.query('PASSENGERS > 10000')
In [8]:
print('Min: ', df['PASSENGERS'].min())
print('Max: ', df['PASSENGERS'].max())
print('Mean: ', df['PASSENGERS'].mean())
In [9]:
OriginToDestination = df.groupby(['ORIGIN', 'DEST'], as_index=False).agg({'PASSENGERS':sum,})
OriginToDestination.head(n=10)
Out[9]:
In [10]:
OriginToDestination = pd.pivot_table(OriginToDestination, values='PASSENGERS', index=['ORIGIN'], columns=['DEST'], aggfunc=np.sum)
OriginToDestination.head()
Out[10]:
In [11]:
OriginToDestination.fillna(0)
Out[11]:
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible.
In [13]:
import sympy
from sympy import *
from sympy.stats import *
from sympy import symbols
from sympy.plotting import plot
from sympy.interactive import printing
printing.init_printing(use_latex=True)
print('Sympy version ' + sympy.__version__)
This example was gleaned from: Rocklin, Matthew, and Andy R. Terrel. "Symbolic Statistics with SymPy." Computing in Science & Engineering 14.3 (2012): 88-93.
Problem: Data assimilation -- we want to assimilate new measurements into a set of old measurements. Both sets of measurements have uncertainty. For example, ACS estimates updated with local data.
Assume we've estimated that the temperature outside is 30 degrees. However, there is certainly uncertainty is our estimate. Let's say +- 3 degrees. In Sympy, we can model this with a normal random variable.
In [14]:
T = Normal('T', 30, 3)
What is the probability that the temperature is actually greater than 33 degrees?
We can use Sympy's integration engine to calculate a precise answer.
In [16]:
P(T > 33)
Out[16]:
In [17]:
N(P(T > 33))
Out[17]:
Assume we now have a thermometer and can measure the temperature. However, there is still uncertainty involved.
In [18]:
noise = Normal('noise', 0, 1.5)
observation = T + noise
We now have two measurements -- 30 +- 3 degrees and 26 +- 1.5 degrees. How do we combine them? 30 +- 3 was our prior measurement. We want to cacluate a better estimate of the temperature (posterior) given an observation of 26 degrees.
In [19]:
T_posterior = given(T, Eq(observation, 26))
In [ ]:
In [ ]: